Bootstrap Thompson Sampling and Sequential Decision Problems in the Behavioral Sciences

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thompson sampling with the online bootstrap

Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstr...

متن کامل

Generalized Thompson sampling for sequential decision-making and causal inference

*Correspondence: [email protected]; [email protected] 1GRASP Laboratory, Electrical and Systems Engineering Department, University of Pennsylvania, Philadelphia, PA 19104, USA 2Max Planck Institute for Biological Cybernetics and Max Planck Institute for Intelligent Systems, Speemanstrasse 38, Tübingen 72076, Germany Abstract Purpose: Sampling an action according to the probability ...

متن کامل

Complex Bandit Problems and Thompson Sampling

We study stochastic multi-armed bandit settings with complex actions derived from the basic bandit arms, e.g., subsets or partitions of basic arms. The decision maker is faced with selecting at each round a complex action instead of a basic arm. We allow the reward of the complex action to be some function of the basic arms’ rewards, and so the feedback observed may not necessarily be the rewar...

متن کامل

Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

*Correspondence: [email protected]; [email protected] 1GRASP Laboratory, Electrical and Systems Engineering Department, University of Pennsylvania, Philadelphia, PA 19104 USA 2Max Planck Institute for Biological Cybernetics and Max Planck Institute for Intelligent Systems, Speemanstrasse 38, Tübingen 72076 Germany Decisions in the presence of latent variables We correct errors in e...

متن کامل

Thompson Sampling for Complex Online Problems

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round. The reward of the complex action is some function of the basic arms’ rewards, and the feedback observed may not necessarily be the reward perarm. For instance, when the complex actions are subsets of the arms, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SAGE Open

سال: 2019

ISSN: 2158-2440,2158-2440

DOI: 10.1177/2158244019851675